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Network modeling plays a critical role in identifying statistical regularities and structural princi- 
ples common to many systems. The large majority of recent modeling approaches are connectivity 
driven. The structural patterns of the network are at the basis of the mechanisms ruling the net- 
' _ ' work formation. Connectivity driven models necessarily provide a time-aggregated representation 

s««3 that may fail to describe the instantaneous and fluctuating dynamics of many networks. We ad- 

Cn dress this challenge by dcflning the activity potential, a time invariant function characterizing the 

agents' interactions and constructing an activity driven model capable of encoding the instanta- 
neous time description of the network dynamics. The model provides an explanation of structural 
features such as the presence of hubs, which simply originate from the heterogeneous activity of 
agents. Within this framework, highly dynamical networks can be described analytically, allow- 
^D ing a quantitative discussion of the biases induced by the time-aggregated representations in the 

Cn analysis of dynamical processes. 
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Q^ Network modeling [IHS] has long drawn on the tradition of social network analysis and graph theory, with models 

fj ranging from the Erdos-Renyi model to Logit models, p*-niodcls, and Markov random graphs [74TT]. In the last decade, 

Q the class of growing network models, exemplified by the preferential attachment model, has been made widely popular 

c/3 by research in statistical physics and computer science |12H17j . All these models can be defined as connectivity driven, 

C/3 as the network's topology is at the core of the model's algorithmic definition. Connectivity-driven network models are 

. Sh well-suited for capturing the essential features of systems such as the Internet, where connections among nodes are 

c/3 long-lived elements [181 I19j . However, in many cases the interactions among the elements of the system are rapidly 

^^ changing and are characterized by processes whose timing and duration are defined on a very short time scale [5D1 [ST] . 

This limit has been investigated in the case of adaptive systems whose structure evolve being coupled to the process 

taking place on top of them J22n26j . Instead, the understating of this limit in time varying networks in which the 

structure evolves independently of the process is still limited and unexplored. In these activity-driven networks, 

'^ models intended to capture the process of accumulating connections over time and the resulting degree distribution 

'^ (i.e. the probability that a node has k connections to other nodes) and other topological properties merely represent 

, -_ a time-integrated perspective of the system. Furthermore, the analysis of dynamical processes in evolving networks 

f.^ is generally performed in the presence of a time-scale separation between the network evolution and the dynamical 

ly-j process unfolding on its structure. In one limit we can consider the network as quenched in its connectivity pattern, 

thus evolving on a time scale that is much longer that the dynamical process itself. In the other limiting case, the 

network is evolving at a time scale much shorter than the dynamical process thus effectively disappearing from the 

p^ definition of the interaction among individuals that is conveniently replaced by effective random couplings. While the 

7-H time scale separation is extremely convenient for the numerical and analytical tractability of the models, networks 

ILJ generally evolve on a time-scale that might be comparable to the one of the dynamical process i27n32j . An accurate 

• ^H modelization of the dynamics of activity-driven networks calls for the definition of interaction processes based on the 

^^ actual measurement of the activity of the agents forming the system, a task now made possible by the availability 

5-H of time-resolved, high-quality data on the instantaneous activity of millions of agents in a wide variety of networks 

^ [33H37I. 
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I. RESULTS 

Here we present the analysis of three large-scale, time-resolved network datasets and define for each node 
a measurable quantity, the activity potential, characterizing its interaction pattern within the network. This 
measure is defined as the number of interactions performed, in a given time window, by each node divided by 
the total number of interactions made by all the nodes in the same time window. We find that the system level 
dynamics of the network can be encoded by the activity potential distribution function from which it is possible to 
derive the appropriate interaction rate among nodes. On the basis of the empirically measured activity potential 
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FIG. 1 Network visualization and degree distribution of the PRL dataset considering three different aggregated views. In 
particular, in the first two rows we focus on the set of authors who wrote at least one paper in the period between 1960 and 
1974. For this subset of 5, 162 active authors we construct three different networks, graphically represented in the central row 
of the figure. The upper row represents a blown up perspective of a particular network region. In the left column we show 
the network of 1974, defined by the active nodes in the given time frame. The central column shows the network obtained 
by integrating over 10 years, from 1974 to 1984. In the right column we show the network obtained by integrating over 30 
years, from 1974 to2004. The first network is highly fragmented as is obvious from the visualization. When larger windows are 
integrated the density of the network increases and heterogeneous connectivity patterns start to emerge. Clearly, as indicated 
by the degree distributions, that consider the complete set of authors (not just those used for the sake of visualization in the 
first two rows), the time scale used to construct the network affects its topological structure. In each visualization the size and 
color of the nodes is proportional to their degree. 



distribution we propose a process model for the generation of random dynamic networks. The activity potential 
function defines the network structure in time and traces back the origin of hubs to the heterogenous activity 
of the network elements. The model allows to write dynamical equations coupling the network dynamics and 
the dynamical processes unfolding on its structure without relying on any time-scale separation approximation. 
We analyze a simple spreading process and provide the explicit analytical expression for the biases introduced 
by the time-aggregated representation of the network when studying dynamical processes occurring on a time 
scale comparable to that of the network evolution. Interestingly the network model presented here is amenable 
to the introduction of many features in the nodes' dynamic such as the the persistency of specific interactions or 
assortative/disassortative correlations, thus defining a general basic modeling framework for rapidly evolving networks. 



A. The activity potential. 

We consider three datasets corresponding to networks in which we can measure the individual agents' activity: 
CoUaborations in the journal "Physical Review Letters" (PRL) published by the American Physical Society j38| . 
messages exchanged over the Twitter microblogging network, and the activity of actors in movies and TV series 
as recorded in the Internet Movie Database (IMDb) [35] ■ In the first dataset the network representation considers 
undirected links connecting two PRL authors if they have collaborated in writing one article. In the second system 
each node is a Twitter user and an undirected link is drawn if at least one message has been exchanged between 
two users. Finally, the actor network is obtained by drawing an undirected link between any two actors who have 
participated in the same movie or TV series. 

Simple evidence for the role of agents' activity in network analysis and modeling can be readily observed in the 
case of the collaboration network of scientific authors i40i. The number of collaborations of any author depends on 
the time window through which we observe the system. In FigIT] we show the networks obtained by time-aggregated 
co-authorships over 1, 10, and 30 years for the subset of authors in the PRL dataset who were active in the considered 
time period. Clearly, the time scale used to construct the network defines a non-stationary connectivity pattern and 
explicitly affects the network structure and its degree distribution. Similar results are found for the other two datasets 
as shown in the Supplementary Information. 

In the three datasets considered, we characterize the individual activity of every agent: papers written, messages 
exchanged, or movie appearances, respectively. For each dataset we measure the individual activity of each agent and 
define the activity potential Xi of the agent i as the number of interactions that he/she performs in a characteristic 
time window of given length Ai, divided by the total number of interactions made by all agents during the same 
time window. The activity potential Xi thus estimates the probability that the agent i was involved in any given 
interaction in the system, and the probability distribution F{x) that a randomly chosen agent i has activity potential 
X statistically defines the interaction dynamics of the system. In Fig. ^ we show the cumulative distribution Fcix) 
evaluated for the three datasets. In all cases we find that, contrary to the degree distribution and other structural 
characteristics of the networks, the distribution Fc{x) is virtually independent of the time scale over which the activity 
potential is measured. Additionally, we find that the distribution Fc{x) is skewed and fairly broadly distributed. 
This is hardly surprising as in many cases measurements of human activity have confirmed the presence of wide 
variability across individuals [HJ U^j . 



B. Activity driven network model. 

Our empirical analysis naturally leads to the definition of a simple model that uses the activity distribution to drive 
the formation of a dynamic network. We consider N nodes (agents) and assign to each node i an activity/firing rate 
ai = rjXi, defined as the probability per unit time to create new contacts or interactions with other individuals, where 
7y is a rescaling factor defined such that the average number of active nodes per unit time in the system is ri{x)N . 
The activity rates are defined such that the numbers Xi are bounded in the interval e < Xi < 1, and are assigned 
according to a given probability distribution F{x) that may be chosen arbitrarily or given by empirical data. We 
impose a lower cut-off e on a: in order to avoid possible divergences of F(x) close to the origin. We assume a simple 
generative process according to the following rules (see FigJ2]-D): 

• At each discrete time step t the network Gt starts with A^ disconnected vertices; 

• With probability a^Ai each vertex i becomes active and generates m links that are connected to m other 
randomly selected vertices. Non-active nodes can still receive connections from other active vertices; 

• At the next time step t -\- Ai, all the edges in the network Gt are deleted. From this definition it follows that 
all interactions have a constant duration r^ = Ai. 

The above model is random and Markovian in the sense that agents do not have memory of the previous time steps. 
The full dynamics of the network and its ensuing structure is thus completely encoded in the activity potential 
distribution F{x). 

In Fig.p^we report the results of numerical simulations of a network with N — 5000, m — 2, rj ~ 10, and F{x) oc a;"'*', 
with 7 = 2.8 and e — lO"'^. The model recovers the same qualitative behavior observed in Fig. [T] At each time step 
the network is a simple random graph with low average connectivity. The accumulation of connections that we observe 
by measuring activity on increasingly larger time slices T generates a skewed Prik) degree distribution with a broad 
variability. The presence of heterogeneities and hubs (nodes with a large number of connections) is due to the wide 
variation of activity rates in the system and the associated highly active agents. However, it is worth remarking that 
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FIG. 2 Cumulative distribution of the activity potential, Fc{x), empirically measured by using four different time windows and 
a schematic representation of the proposed network model. In particular, in panel (A) we show the cumulative distributions of 
the observables x for Twitter, in panel (B) for IMDb, and in panel (C) for PRL. In panel (D) we show a schematic representation 
of the model. Considering just 13 nodes and m = 3, we plot a visualization of the resulting networks for 3 different time steps. 
The red nodes represent the firing/active nodes. The final visualization represents the network after integration over all time 
steps. 



hub formation has a different interpretation than in growing network prescriptions, such as preferential attachment. 
In those cases hubs are created by a positional advantage in degree space leading to the passive attraction of more 
and more connections. In our model, the creation of hubs results from the presence of nodes with high activity rate, 
which are more willing to repeatedly engage in interactions. 

The model allows for a simple analytical treatment. We define the integrated network Gt 
of all the networks obtained in each previous time step. The instantaneous network generated at each time t will be 
composed of a set of slightly interconnected nodes corresponding to the agents that were active at that particular 
time, plus those who received connections from active agents. Each active node will create m links and the total edges 
per unit time are Et = mNr]{x) yielding the average degree per unit time the contact rate of the network 
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The instantaneous network will be composed by a set of stars, the vertices that were active at that time step, with 
degree larger than or equal to m, plus some vertices with low degree. The corresponding integrated network, on the 
other hand, will generally not be sparse, being the union of all the instantaneous networks at previous times (see 
Fig. ^. In fact, for large time T and network size N, when the degree in the integrated network can be approximated 
by a continuous variable, we can show (see Supplementary Information) that agent i will have at time T a degree in 



the integrated network given by h{T) = N {l - e"'^™''^'/^'! 
Pxik) of the integrated network at time T takes the form: 



It can then easily be shown that the degree distribution 



PT{k) - F 



Tmrj 



(2) 



where we have considered the limit of small k/N and k/T (i.e. large network size and times). The noticeable 
result here is the relation between the degree distribution of the integrated network and the distribution of individual 
activity, which, from the previous equation, share the same functional form. This relation is approximately recovered 
in the empirical data, where the activity potential distribution is in reasonable agreement with the appropriately 
rescaled asymptotic degree distribution of the corresponding network (see Fig. ffl-A) . As expected, differences between 
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FIG. 3 Visualization and degree distributions of the proposed network model considering different aggregated views. We fix 
N = 5000, m = 2, rj = 10, F{x) oc x~"' with 7 = 2.8, e < a; < 1 with e = 10~^. We plot the network obtained after one 
time step in the first column, the network obtained after integrating over 10 iterations in the second column, and the network 
obtained after integrating over 20 iterations in the last column. Interestingly, even though the model is random and markovian 
by construction, we observe a behavior qualitatively similar to the case of PRL: the single time window yields a sparse and 
poorly connected network with a trivial degree distribution. When larger time scales are considered, heterogeneous connectivity 
patterns start to emerge as seen by the corresponding degree distributions. In each visualization the size and color of the nodes 
is proportional to their degree. 

the two distributions are present, due to features of the real network dynamics that our random model does not 
capture: links might have memory (already explored connections are more likely to happen again), social relations 
have a lifetime distribution (persistence) and multiple connections and weighted links may be relevant. Neither of 
these effects is considered in the model. We report some statistical analysis of those features in the Supplementary 
Information as further ingredients to be considered in future extensions of the model. 



C. Dynamical processes in activity driven networks. 



Recent research has highlighted the key role of interaction dynamics as opposed to static studies. For example, an 
individual who appears to be central by traditional network metrics may in fact be the last to be infected because of 
the timing of his/her interactions [301 143| . Analogously the concurrency of sexual partners can dramatically accelerate 
the spread of STDs [3T]. Despite its simplicity, our model makes it analytically explicit that the actors' activity time 
scale plays a major role in the understanding of processes unfolding on dynamical networks. Let us consider the 
susceptible-infected-susceptible (SIS) epidemic compartmental model [ij |2j |44l |45] . In this model, infected individuals 
can propagate the disease to healthy neighbors with probability A, while infected individuals recover with rate fi 
and become susceptible again. In an homogenous population the behavior of the epidemics is controlled by the 
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FIG. 4 In panel (A) we consider the entire Twitter dataset and show the distribution of activity potential F{x) and the 
asymptotic degree distribution of the corresponding network, Prl^k], with ^ = l/{Trjm), rescaled according to the analytical 
result. In panel (B) we show the density of infected nodes, lao, in the stationary state, obtained from numerical simulations of 
the SIS model on a network generated according to the proposed model and two other networks resulting from an integration of 
the model over 20 and 40 time steps, respectively. We set A'^ — 10^, m, = 5, r; = 10, F{x) oc x~'^ with 7 = 2.1 and e < x < 1 with 
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Each point represents an average over 10 independent simulations. The red triangle marks the critical reproductive 



number Ro as predicted by Eq. HI 



reproductive number Rq = 13/ fi^ where /3 — X{k) is the per capita spreading rate that takes into account the rate of 
contacts of each individual. The reproductive number identifies the average number of secondary cases generated by 
a primary case in an entirely susceptible population and defines the epidemic threshold such that only if i?o > 1 can 
epidemics reach an endemic state and spread into a closed population. In the past few years the inclusion of complex 
connectivity networks and mobility schemes into the substrate of spreading processes contagion, diffusion, transfer, 
etc. has highlighted new and interesting results [46H50] . Several results states that the epidemic threshold depends 
on the topological properties of the networks. In particular, for networks characterized by a fix, quenched topology 
the threshold is given by the principal eigenvalue of the adjacency matrix |481 I49j. Instead, for annealed network, 
characterized by a topology defined just on average because the connectivity patterns has a dynamic extremely fast 
with respect to the dynamical process, heterogeneous mean-field approaches [21 E] predict an epidemic threshold that 
is inversely proportional to the second moment of the network's degree distribution: /3/fj, > (/e)^/(A:^). However, these 
results do not apply to the case in which the time variation of the connectivity pattern is occurring on the same time 
scale of the dynamical process. Our model presents simple evidence of this problem, as a disease with a small value 
of IjI~^ (the infectious period characteristic time) will have time to explore the fully-integrated network, but will not 
spread on the dynamic instantaneous networks whose union defines the integrated one [301 1311 HSl HI] ■ In Fig- |4]"I^ 
we plot the results of numerical simulations of the SIS model on a network generated according to our model and on 
two time-aggregated network instances. We observe that the two aggregated networks lead to misleading results in 
both the threshold and the epidemic magnitude as a function of /3//i. Even if the epidemic threshold discounts the 
different average degree of the networks in the factor /3 — X{k), the two aggregated instances consider all edges as 
always available to carry the contagion process, disregarding the fact that the edges may be active or not according 
to a specific time sequence defined by the agents' activity. 

The above finding can be more precisely quantified by calculating analytically the epidemic threshold in activity 
driven networks without relying on any time aggregated view of the network connectivity. By working with activity 
rate we can derive epidemic evolution equation in which the spreading process and the network dynamics are coupled 



together. Let us assume a distribution of activity potential x of nodes given by a general distribution F{x) as before. 
At a mean-field level, the epidemic process will be characterized by the number of infected individuals in the class of 
activity rate a, at time i, namely /*. The number of infected individuals of class a at time i + Ai given by: 

da'f + \m{Nl - ID j da'^^, (3) 

where Na is the total number of individuals with activity a. In Eq. ([3]), the third term on the right side takes into 
account the probability that a susceptible of class a is active and acquires the infection getting a connection from any 
other infected individual (summing over all different classes), while the last term takes into account the probability 
that a susceptible, independently of his activity, gets a connection from any infected active individual. The above 
equation can be solved as shown in the material and methods section, yielding the following epidemic threshold for 
the activity driven model: 

^ > -^^. (4) 

This result considers the activity rate of each actor and therefore takes into account the actual dynamics of 
interactions; the above formula does not depend on the time-aggregated network representation and provides the 
epidemic threshold as a function of the interaction rate of the nodes. This allows to characterize the spreading 
condition on the natural time scale of the combination of the network and spreading process evolution. 



II. DISCUSSION 

We have presented a model of dynamical networks that encodes the connectivity pattern in a single function, the 
activity potential distribution, that can be empirically measured in real world networks for which longitudinal data 
are available. This function allows the definition of a simple dynamical process based on the nodes' activity rate, 
providing a time dependent description of the network's connectivity pattern. Despite its simplicity, the model can 
be used to solve analytically the co-evolution of the network and contagion processes and characterize quantitatively 
the biases generated by time-scale separation techniques. Furthermore the proposed model appears to be suited 
as a testbed to discuss the effect of network dynamics on other processes such as damage resilience, discovery and 
data mining, collective behavior and synchronization. While we have reduced the level of realism for the sake of 
parsimony of the presented model, we are aware of the importance of analyzing other features of actor activity such 
as concurrency, persistence and different weights associated with each connection. These features must necessarily 
be added to the model in order to remove the limitations set by the simple random network structures generated 
here and represent interesting challenges for future work in this area. 



III. METHODS 

A. Datasets 

We considered three different dataset: the collaborations in the journal "Physical Review Letters" (PRL) published 
by the APS, the message exchanged on Twitter and the activity of actors in movies and TV series as recorded in the 
Internet Movie Database (IMDb). In particular: 

PRL dataset. In this database the network representation considers each author of a PRL article as a node. 
An undirected link between two different authors is drawn if they collaborated in the same article. We filter out 
all the articles with more than 10 authors in order to focus our attention just on small collaborations in which we 
can assume that the social components is relevant. We consider the period between 1960 and 2004. In this time 
window we registered 71,583 active nodes and 261,553 connections among them. In this dataset is natural defining 
the activity rate, a, of each author as the number of papers written in a specific time window At = 1 year. Authors 
with no collaborative papers in the total time span considered (isolates) are not included in the data set. 

Twitter Dataset. Having been granted temporary access to Twitter's firehose we mined the stream for over 6 
months to identify a large sample of active user accounts. Using the API, we then queried for the complete history 



of 3 million users, resulting in a total of over 380 million individual tweets covering almost 4 years of user activity on 
Twitter. In this database the network representation considers each users as a node. An undirected link between two 
different users is drawn if they exchanged at least one message. We focus our attention on 9 months during 2008. 
In this time window we registered 531,788 active nodes and 2,566,398 connections among them. In this dataset we 
define the activity rate of each user as the number of messages sent in a time window Ai = 1 day. 

IMDb Dataset. In this database the network representation considers each actor as a node. An undirected 
link between two different actors is drawn if they collaborated in the same movie/TV series. We focus on the period 
between 1950 and 2010. During this time period we registered 1,273,631 active nodes and 47,884,882 connections 
between them. A natural way to define the activity rate in this dataset is to consider the number of movies acted by 
each actor in a specific time window At = 1 year. 



B. Epidemic threshold 

In order to solve Eq. ([3]) we can consider the total number of infectious nodes in the system 

/ daP+^' == 7*+'^* = /* - /iAt/* + ATO(a) At/* + XmO^At, (5) 

where 0* = /da'/*, a' and we have dropped all second order terms in the activity rate a and in /*. We are not 
considering events in which two infected nodes choose each other for connection and we are considering a linear 
approximation in /* since in the beginning of the epidemics the number of infectious individuals in each class is small. 
In order to obtain an closed expression for 6 we multiply both sides of Eq. (|3| by a and integrate over all activity 
spectrum, obtaining the equation 

gt+At ^0t _ ^^^Qt ^ Xm{a^)PAt + Am(a)6i*At. (6) 

In the continuous time limit we obtain the following closed system of equations 

dj = -fiI + Xm{a)I + Xme, (7) 

dtO = ~fj.e + Xm{a'^)I + Xm{a)0, (8) 

whose Jacobian matrix has eigenvalues 



^(1,2) = (a)Am — /i ± Am-\/(a2). (9) 

The epidemic threshold for the system is obtained requiring the largest eigenvalues to be larger the 0, which leads to 
the condition for the presence of an endemic state: 

All 

- > ^ (10) 

^J■ rn{a) + ^(a^) 

From this last expression we can recover the epidemic threshold of Eq. Q by considering /3 = X{k), a^ = rjXi and 
(k) = 2mr]{x). 
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